[Tutorial] Training PaddleOCR Scene Text Recognition in the Wild with Custom Dataset

您所在的位置：网站首页 › paddle inference ocr › [Tutorial] Training PaddleOCR Scene Text Recognition in the Wild with Custom Dataset

[Tutorial] Training PaddleOCR Scene Text Recognition in the Wild with Custom Dataset

#[Tutorial] Training PaddleOCR Scene Text Recognition in the Wild with Custom Dataset | 来源: 网络整理| 查看: 265

Through this article I would be training both detection and recognition modules of PP-OCR to create a full fledged scene text recognition system.

This part would involve transfer learning and training from scratch for both modules on Scene text dataset consisting of horizontal, curved and multi-oriented images.

OCR

OCR or Optical Character Recognition is a system used to recognize text from a digital image or video stream. The system may include both the hardware and software components bundled together or just the software for recognition.

One popular state of the art OCR software is Tesseract. Its used to read text from images with a considerable high level of accuracy and is used in many of the pdf scanners and applications found online.

The main issue with these open source OCR on experience is that they are mostly used to treating images as documents rather than actual images found in real life scenarios. OCR’s like Tesseract can work on multi-line and oriented documents but the recognition only seems to work accurately when the images are mostly documents or pages with text arranged properly. One main reason for this being that these modules use classical computer vision techniques of binarization, thresholding etc. to try and separate out the words from the background which in theory would be of a very high contrast to each other (think of it like a white page with black text).

Pre-processing Done by Tesseract before recognition

For an OCR system to actually work on text images from any image the main part is to first detect the region of text and apply the recognition algorithm on that.

Scene Text Recognition in the wild becomes a two pronged problem statement. 1. Detect the concerned region of text in the image.2. Crop that detected region and apply text recognition algorithm on that region.

CRNN

The following is a very simple explanation of what a CRNN is to get an intuition behind the OCR training and inference. Convolutional Recurrent Neural Networks are deep learning architectures popularly used for image-text recognition. The CNN is used to extract and learn from abstract features in the image automatically. The RNN analyses and stores the previous information from the image features along with the current one to understand the importance of certain features during inference. Basically we are trying to sort of remember the sequence in the features too. Some ocr’s use a two way RNN (Bi-LSTM) to remember the sequence more effectively and even predict future (unseen features) better. The loss function to get the character/text, popularly seen is the CTC loss function. CTC loss allows the embedded RNN output to allow for variable length in the sequence, if one image is of 3 letters and the other is of 10, the CTC would predict that. Another loss function getting more popularity in the recent times is Attention primarily due to the rise of Transformer architecture in recent years and now their advent into computer vision problems.

Paddle-OCR

Paddle OCR is a lightweight ocr system with inbuilt detection and recognition in the pipeline. It supports multilingual training and claims high inference speed along with accuracy on using mobile/edge friendly CNN architectures.

Detection visualized as bounding boxes along with the recognizer output and confidence scores.TrainingConvert the image data and annotations into the required format for the detector and recognizer respectively.

For this exercise will be using Total Text — Scene Text Recognition dataset from Kaggle. This dataset consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

We will download the dataset from Kaggle using the Kaggle API and use the Train folder and Test folder for training and evaluation respectively. Each image contains a corresponding Json for Polygon annotations of the text region in the image along with the annotation. Mapping each image with the json we will use only those annotations which have 8 point coordinate (4 region coordinates of x1,y1,x2,y2,x3,y3,x4,y4,x5,y5,x6,y6,x7,y7,x8,y8) since PaddleOCR only works on these.

For the Detector we need to create two .txt files for training and validation respectively with the format as follows:

image_path\t[{“points”:[[x1,y1],[x2,y2],[x3,y3],[x4,y4]], “transcription”:text_annotation}, {“points”:…..}]\n

Detection txt file format

Similarly for the recognizer the two .txt file formats would be:

image_path\tAnnotation\nimage_path\tAnnotation\n

Recognizer txt file format

The script for creating the training and evaluation .txt files from the Train and Test folder will be given in the notebook at the end of this article.

2. Download the models from the PaddleOCR website.

PaddleOCR provides multiple pre-trained models for both English, Chinese, Devnagiri and other languages. Check the git repository for the language you want to train your ocr on.

Here we are using MobileNet_V3_Large backend for the detector and their latest PP-OCR-V3 model for the recognizer respectively.

Downloading the Detector weights for trainingDownloading the Recognizer weights for training

3. Change the .yml config files

Now that the annotations and images are ready we need to edit the config files for both the detector and recognizer by adding our own local paths.

Once you have cloned the PaddleOCR repository before downloading the weights.

For detection in the PaddleOCR/configs/det/det_mv3_db.yml Under Global.pretrained_model add the path of the weights you have downloaded them.Under Train.dataset.data_dir and Train.dataset.label_file_list add the path of Training images folder and the training .txt file path respectively Under Eval.dataset.data_dir and Eval.dataset.label_file_list add the path of Evaluation images folder and the evaluation.txt file path respectively

replace this with your own paths

For recognition in the PaddleOCR/configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml repeat the same edits as earlier.

4. Start Training

Detector Training ScriptDetector Training ScriptRecognizer Training ScriptInferenceConversion of weights

After Training the weights we need to convert or export them before we can run inference.

Detector weights export

python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=”./output/db_mv3/best_accuracy” Global.save_inference_dir=”./output/det_db_inference/”

Recognizer weights conversion

python3 tools/export_model.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model=output/v3_en_mobile/best_accuracy Global.save_inference_dir=./inference/en_PP-OCRv3_rec/

2. Loading the weights using the PaddleOCR Library

We will take a sample image and load it using OpenCV which we’ll give it to the OCR.

Testing Image

Finally with our results we will draw the bounding boxes and print the ocr results along with the confidence scores for each box respectively.

Final ResultsNoteCpu users pip install paddlepaddle while GPU users pip install paddlepaddle-gpuWhile running on Colab the installation works fine but using a gpu with CUDNN >8.0 gives an error. Any error while training or loading the model which are shown as Segmentation Error or C++ Stack error is incompatibility of the paddlepaddle-gpu with your card’s CUDNN version.Resolved this by using their official website before installing the library in a fresh python3 environment.Using CUDA 11.4 and CUDNN 8.2 the above installation worked for me

Notebook Link

https://gist.github.com/leonbora167/049ac6622b7a2fb5c23ec48070af486f

【本文地址】

[Tutorial] Training PaddleOCR Scene Text Recognition in the Wild with Custom Dataset

[Tutorial] Training PaddleOCR Scene Text Recognition in the Wild with Custom Dataset

今日新闻

推荐新闻